Large-scale sophisticated linguistic monitoring
نویسندگان
چکیده
A key means of monitoring current and evolving events is through their linguistic signatures in text-based information sources (e.g, social media like Twitter, Facebook, and Reddit). Given the vast quantities of available data, reliable automated analysis is vital for assisting human intelligence in detecting credible threats. This is especially true given both (i) the complexity of information communicated via language, and (ii) the complexity of the human communication process. Currently, automated analysis software capable of operating over large-scale datasets in real time is limited in its ability to both (i) reliably detect sophisticated non-linguistic information communicated via the linguistic signal, and (ii) reliably decipher the true meaning underlying a particular linguistic signal. As one example, consider subtle linguistic cues that communicate bias: one post may describe a group as “freedom fighters” while another describes them as “terrorists”. The connotation of the first expression communicates something quite different from the second, indicating the writer’s bias for or against the group, which itself may indicate sympathies or group affiliations. The emerging field of computational sociolinguistics (Nguyen, Doğruöz, Rosé, & de Jong, 2016) tackles the automatic detection of this kind of non-linguistic information from the linguistic signal (e.g., connotations: Rashkin, Singh, & Choi, 2015; identity: Pearl & Steyvers, 2012; Pearl, Lu, & Haghighi, 2017; mental state: Pearl & Steyvers, 2010, 2013; Pearl & Enverga, 2015; perspectives: Hardisty, Boyd-Graber, & Resnik, 2010; Card, Boydstun, Gross, Resnik, & Smith, 2015). As another example, consider the complex reasoning process people use to understand language in context: if someone posts “oh yeah, i just want to murder that guy”, there are several possible interpretations. First, this post may be an example of hyperbole or exaggeration, with the writer negatively disposed towards the person in question but without plans to actually murder him. Second, this post may be an example of sarcasm or irony, with the writer positively inclined towards the person in question but using this expression to make a rhetorical point. Third, this post is literal truth and so is a legitimate death threat that should be monitored. Recent approaches in the field of computational pragmatics draw on shared context and human processes of social reasoning (Frank & Goodman, 2012; Goodman & Stuhlmüller, 2013; Goodman & Frank, 2016) to identify the true interpretation behind a particular linguistic expression (hyperbole: Kao, Wu, Bergen, & Goodman, 2014; irony: Kao & Goodman 2015; resolving ambiguity: Savinelli, Scontras, & Pearl, 2017; politeness: Yoon, Tessler, Goodman, & Frank, 2016; metaphor: Kao, Bergen, & Goodman, 2014; humor: Kao, Levy, & Goodman, 2015).
منابع مشابه
Addressing the Resource Bottleneck to Create Large-Scale Annotated Texts
Large-scale linguistically annotated resources have become available in recent years. This is partly due to sophisticated automatic and semiautomatic approaches that work well on specific tasks such as part-ofspeech tagging. For more complex linguistic phenomena like anaphora resolution there are no tools that result in high-quality annotations without massive user intervention. Annotated corpo...
متن کاملImproving Machine Learning Approaches to Coreference Resolution
We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces the best results to date on the MUC6 and MUC-7 coreference resolution data sets — F-measures of 70.4 and 63.4, respectively. Improvements arise from two sources: extra-linguistic changes to the learning framework and a large-scale expansion of the feature set to include more s...
متن کاملConceptual Indexing: Practical Large-Scale AI for Efficient Information Access
Finding information is a problem shared by people and intelligent systems. This paper describes an experiment combining both human and machine aspects in a knowledgebased system to help people find information in text. Unlike many previous attempts, this system demonstrates a substantial improvement in search effectiveness by using linguistic and world knowledge and exploiting sophisticated kno...
متن کاملEvaluation of Close-Range Photogrammetric Technique for Deformation Monitoring of Large-Scale Structures: A review
Close-range photogrammetry has been used in many applications in recent decades in various fields such as industry, cultural heritage, medicine and civil engineering. As an important tool for displacement measurement and deformation monitoring, close-range photogrammetry has generally been employed in industrial plants, quality control and accidents. Although close-range photogrammetric applica...
متن کاملCollaborative Annotation and Visualization of Functional and Discourse Structures
Linguistic annotation is the process of adding additional notations to raw linguistic data for descriptive or analytical purposes. In the tagging of complex Chinese and multilingual linguistic data with a sophisticated linguistic framework, immediate visualization of the complex multi-layered functional and discourse structures is crucial for both speeding up the tagging process and reducing er...
متن کامل